This project is available on GitHub.
fictious data, produced by generate R script
Challenges built in
The biographical data has 14 variables and 100000 observations. The data is stored at the donor level. Each row of the data represents a unique donor and biographical information about that donor.
There are 4 numeric variables:
## Rows: 100,000
## Columns: 4
## $ id <dbl> 8275707, 2963581, 4302254, 7637444, 9369155, 1026439, 65…
## $ household_id <dbl> 1000235, 1000235, 1000303, 1000341, 1000341, 1000435, 10…
## $ lat <dbl> 34.03, 41.29, NA, 36.07, 26.23, 33.60, 40.99, 38.82, 32.…
## $ lon <dbl> -117.75, -92.63, NA, -94.15, -80.13, -117.71, -74.34, -7…
When loaded by default there are 9 character variables:
## Rows: 100,000
## Columns: 9
## $ name <chr> "al-Shakoor, Labeeb", "Nero, Brianna", "al-Rasheed, R…
## $ country <chr> "United States", "United States", "China", "United St…
## $ city <chr> "Pomona", "Oskaloosa", "Shenzhen", "Fayetteville", "P…
## $ deceased <chr> "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y"…
## $ zip <chr> "91766", "52577", NA, "72701", "33069", "92653", "074…
## $ state <chr> "CA", "IA", NA, "AR", "FL", "CA", "NJ", "VA", "TX", N…
## $ capacity <chr> "$50k - $75K", "$1k - $2.5k", "$5k - $10k", "$2.5k - …
## $ capacity_source <chr> "screening", "screening", "screening", "institutional…
## $ race <chr> "Non-Hispanic white", "Non-Hispanic white", "Asian", …
There is 1 date variable:
## Rows: 100,000
## Columns: 1
## $ birthday <date> 1923-11-18, 1925-03-18, 1924-08-28, 1923-05-14, 1921-10-11,…